Overview

This session is about the basics of stochastic actor-oriented models (SAOMs). We refer to the RSiena page https://www.stats.ox.ac.uk/~snijders/siena/ for further resources. Here we will

  • Introduce SAOM as a simulation model
  • Present the basic functions and functionalities
  • Briefly discuss an empirical example

It is appropriate that we simulate the model as Siena stands for

Simulation Investigation for Empirical Network Analysis

Packages

We will use functionality from the network packages sna and network (see https://raw.githubusercontent.com/johankoskinen/CHDH-SNA/main/Markdowns/CHDH-SNA-2.Rmd). The main packages for SAOM is RSiena.

R

First time you use them, install the packages (uncomment the install commmands)

# install.packages("sna")
# install.packages("network")
# install.packages("RSiena")

Once packages are installed, load them

library("sna")
library("network")
library('RSiena')

Data set

The RSiena package automatically loads the van de Bunt’s Freshman dataset (see Data description). We will use time-point 3 and 4

?tmp3# opens the helpfile with information about the dataset
class(tmp3)# both time-points are regular adjacency matrices
class(tmp4)
dim(tmp3)# 32 students by 32
n <- dim(tmp4)[1]

Data checking

The original tie-variables had 6 levels but have here been dichotomised. It is good practice to check that ties are binary

table(tmp3,useNA = 'always')
table(tmp4,useNA = 'always')

Missing

RSiena handles missing data in estimation, but for the purposes of simulating and investigating the network, replace missing values

tmp3[is.na(tmp3)] <- 0 # remove missing
tmp4[is.na(tmp4)] <- 0 # remove missing

Plot change

Plotting the networks with nodes in the same places illustrates what the SAOM will try to model

par(mfrow = c(1,2))
coordin <- plot(as.network(tmp3), main='Time 3')
plot(as.network(tmp4),coord=coordin, main='Time 4')

Looking closely we see that some arcs have been added and others been removed

In this session, we will assume that we have one initial network \(X(t_0)=x_{obs}(t_0)\), at time \(t_0\) and that we want to say something about a network \(X(t_1)\), at time \(t_1\), \(t_0<t_1\).

We will only use two waves but because of the Markov property of the model, all ingredients extend without limitations to several waves.

Format data

To simulate (but also estimate) we need to execute, in turn, each of the functions

  1. sienaDependent - formats class matrix to class sienaDependent
  2. sienaDataCreate - formats data to class siena
  3. getEffects - provides us with the effects we can use for the process
  4. sienaAlgorithmCreate - sets up simulation/estimation settings

After these steps, the model is simulated/estimated using siena07

sienaDependent

The function sienaDependent formats and translates the two adjacency matrices to a “sienaDependent” object:

mynet1 <- sienaDependent(array(c(tmp3, tmp4), # the matrices in order
                               dim=c(32, 32,2))# are set as two slices in an array
                         )

For networks, sienaDependent takes networks clued together in an \(n \times n \times\) number of waves, array.

You can define, as dependent variables, one-mode networks, two-mode networks, or dependent mondadic variables

sienaDataCreate

Once you have defined your variables, these are combined into a siena object using sienaDataCreate

mydata <- sienaDataCreate(mynet1)

The siena object adds all relevant information about the data

mydata
## Dependent variables:  mynet1 
## Number of observations: 2 
## 
## Nodeset                  Actors 
## Number of nodes              32 
## 
## Dependent variable mynet1   
## Type               oneMode  
## Observations       2        
## Nodeset            Actors   
## Densities          0.15 0.18

Simulate

getEffects

To determined what effects are available for our combination of different types of data

myeff <- getEffects(mydata)

Assume a model where an actor \(i\), when they make a change, only cares about how many ties they have.

myeff <- includeEffects(myeff, recip,include=FALSE)# remove reciprocity which is DEFAULT
## [1] effectName   include      fix          test         initialValue
## [6] parm        
## <0 rows> (or 0-length row.names)
myeff$initialValue[myeff$shortName=='Rate'] <- 5.7288
myeff$initialValue[myeff$shortName=='density'][1] <- -0.7349

For later reference, notice how myeff is on the right-hand side of the allocation of includeEffects

Waiting times

What does the rate \(\lambda = 5.7288\) mean?

Each time the network has changed (or an oppotunity to change), each actor waits \(T_i \overset{i.i.d.}{\thicksim} Exp(\lambda)\) time.

The actor that gets to change is the one who waits for the shortest amount of time

waiting <- rexp(32, 5.7288)
hist(waiting)

winner <- which( waiting == min(waiting))
paste('The winner is actor: ', winner)
## [1] "The winner is actor:  19"

Micro-step

In the example of \(i=1\), deciding between creating a tie to \(j=2,4\) or breaking the tie to \(j=3\)

Micro-step
Micro-step

With our current model

\[ \Pr(X(1\leadsto 2)=\Pr(X(1\leadsto 4)=\frac{e^{\beta_1(1-2x_{12})}}{1+\sum_{j\neq i} e^{\beta(1-2x_{ij})}}=\frac{\exp(-0.7349)}{\exp(-0.7349)+\exp(0.7349)+\exp(-0.7349)+1} \] and

\[ \Pr(X(1\leadsto 3)=\frac{e^{\beta_1(1-2x_{13})}}{1+\sum_{j\neq i} e^{\beta(1-2x_{ij})}}=\frac{\exp(0.7349)}{\exp(-0.7349)+\exp(0.7349)+\exp(-0.7349)+1} \] The conditional probability of \(i=1\) creating the tie to \(j=2\) is thus

exp(-0.7349)/(exp(-0.7349)+exp(0.7349)+exp(-0.7349)+1)
## [1] 0.1185728

and the conditional probability of \(i=1\) deleting the tie to \(j=3\) is thus

exp(0.7349)/(exp(-0.7349)+exp(0.7349)+exp(-0.7349)+1)

sienaAlgorithmCreate

The function sienaAlgorithmCreate determines the simulation/estimation settings. Here we are going to

  • Simulate simOnly = TRUE
  • a total of n3 = 100

Networks, starting in \(X(t_3)\)

sim_model  <-  sienaAlgorithmCreate( 
                          projname = 'sim_model',# name will be appended to output 
                          cond = FALSE, # NOT conditioning on num. changes
                          useStdInits = FALSE,# we are changing some defaults
                          nsub = 0 ,# more about subphases in estimation
                          n3 = 100,# number of simulations (in Phase 3)
                          simOnly = TRUE)# we only want to simulate

We are ready to simulate!

siena07

The function siena07 is the main engine of RSiena and is used to estimate all models (apart from hierarchical SAOMS)

sim_ans <- siena07( sim_model,# the name of our model
                    data = mydata,# all of our data - see above for what is in there
                    effects = myeff,# the effects we have chosen, including parameters
                    returnDeps = TRUE,# save simulated networks
                    batch=TRUE )# batch=FALSE opens a GUI

Read in a function

The networks are in sim_ans$sims but to help with extracting and formatting them we read in the function reshapeRSienaDeps

source("https://raw.githubusercontent.com/johankoskinen/CHDH-SNA/main/data/reshapeRSienaDeps.R")

We apply it on sim_ans

mySimNets <- reshapeRSienaDeps(sim_ans,n)# n is the number of nodes (defined above)

The object mySimNets is an 100 by 32 by 32 array with 9 adjacency matrices

Plot simulated networks

Plot networks with the same layout as for \(X(t_3)\) above

par(mfrow=c(2,5),# want 2 by 5 panels
    oma = c(0,1,0,0) + 0.1,# custom outer margins
    mar = c(1,0,0,0) + 0.1)# custom inner margins
plot(as.network(tmp4),# observed network at t4
     coord=coordin) # observed network
invisible(# need to supress printing to consol
  apply(mySimNets[1:9,,],# for the first 9 networks in array
      1,# along first dimenstion, apply the following function
      function(x)
        plot(as.network(x),coord=coordin) ) )

It is hard to spot any differences. Plot density of the networks

hist( gden(mySimNets ) )
abline( v = gden(tmp4), col='red')

If actors only care about how many ties they have, we predict the density at \(t_4\) correctly

How about the number of reciprocated ties \(x_{ij}x_{ji}=1\)

hist( dyad.census(mySimNets)[,1] ,xlim=c(9,50))
abline(v = dyad.census(tmp4)[1], col='red')

Clearly, if the only rule actors have is to care about the number of ties they have, this is not enough to explain why there are so many (46) reciprocal dyads

Reciprocity model

Simulate assuming that actors want to have reciprocated ties to others with the model with probabilities \(\Pr(X(i \leadsto j) |i \text{ makes decision})\): \[ p_{ij}(X,\beta)=\frac{e^{\beta_d (1-2x_{ij}) +\beta_r (1-2x_{ij})x_{ji} } }{ \sum_h e^{\beta_d (1-2x_{ih}) +\beta_r (1-2x_{ih})x_{hj} } } \]

with parameters \(\beta_d=-1.1046\) and \(\beta_r=-1.2608\):

myeff <- includeEffects(myeff, recip,include=TRUE)# reinstate reciprocity which is DEFAULT
##   effectName  include fix   test  initialValue parm
## 1 reciprocity TRUE    FALSE FALSE          0   0
myeff$initialValue[myeff$shortName =='recip'][1] <- 1.2608
#### === We also need to change the other values
myeff$initialValue[myeff$shortName=='Rate'] <- 6.3477
myeff$initialValue[myeff$shortName=='density'][1] <- -1.1046

Log odds for a new graph

\(x_{ji}\)
\(x_{ij}\) 0 1
0 0 \(\beta_d\)
1 \(\beta_d\) \(\beta_d+\beta_r\)

Simulate

Set up the algorithm

sim_model  <-  sienaAlgorithmCreate( 
  projname = 'sim_model',
  cond = FALSE, 
  useStdInits = FALSE,
  nsub = 0 ,
  n3 = 100,
  simOnly = TRUE)

Run simulation

sim_ans <- siena07( sim_model, data = mydata,
                    effects = myeff,
                    returnDeps = TRUE, batch=TRUE )

Reshape and extract networks:

mySimNets <- reshapeRSienaDeps(sim_ans,n)

Investigate simulated networks

Plot, using the same code as before

par(mfrow=c(2,5),# want 2 by 5 panels
    oma = c(0,1,0,0) + 0.1,# custom outer margins
    mar = c(1,0,0,0) + 0.1)# custom inner margins
plot(as.network(tmp4),# observed network at t4
     coord=coordin) # observed network
invisible(# need to supress printing to consol
  apply(mySimNets[1:9,,],# for the first 9 networks in array
      1,# along first dimenstion, apply the following function
      function(x)
        plot(as.network(x),coord=coordin) ) )

Compare the observed number of reciprocated dyads to the simulated number of reciprocated dyads

hist( dyad.census(mySimNets)[,1])
abline(v = dyad.census(tmp4)[1], col='red')

With an actor preference of 1.26 for having their ties reicprocated we reproduce the reciprocity

What about the triad cencus? Check the first 9

triad.census(tmp4)# observed
triad.census(mySimNets[1:9,,])# first 9 simulated networks

Look at the distributions for transitive triads and dense triads

par(mfrow=c(1,2))
hist( triad.census(mySimNets)[,9], xlim=c(8,40))# column 9 is 030T
abline(v = triad.census(tmp4)[9], col='red')
hist( triad.census(mySimNets)[,16], xlim=c(4,24))# column 16 is 300
abline(v = triad.census(tmp4)[16], col='red')

A prefence for reciprocity is not enough to explain why reciprocated dyads hang together in triangles or why there are so many transitive triads

Triangle model

Add another positive parameter \(\beta_t\) for the preference for closing open triads.

myeff <- includeEffects(myeff, recip,include=TRUE) # add the effect reciprocated ties
myeff <- includeEffects(myeff, transTrip,include=TRUE)
myeff$initialValue[myeff$shortName=='Rate'] <- 7.0959
myeff$initialValue[myeff$shortName=='density'][1] <- -1.6468
myeff$initialValue[myeff$shortName=='recip'][1] <- 0.8932
myeff$initialValue[myeff$shortName=='transTrip'][1] <- 0.2772

Simulate

Set up the algorithm

sim_model  <-  sienaAlgorithmCreate( 
  projname = 'sim_model',
  cond = FALSE, 
  useStdInits = FALSE,
  nsub = 0 ,
  n3 = 100,
  simOnly = TRUE)

Run simulation

sim_ans <- siena07( sim_model, data = mydata,
                    effects = myeff,
                    returnDeps = TRUE, batch=TRUE )

Reshape and extract networks:

mySimNets <- reshapeRSienaDeps(sim_ans,n)

What about the triad cencus? Check the first 9

triad.census(tmp4)# observed
triad.census(mySimNets[1:9,,])# first 9 simulated networks

Look at the distributions for transitive triads and dense triads

par(mfrow=c(1,2))
hist( triad.census(mySimNets)[,9], xlim=c(2,55))# column 9 is 030T
abline(v = triad.census(tmp4)[9], col='red')
hist( triad.census(mySimNets)[,16], xlim=c(4,86))# column 16 is 300
abline(v = triad.census(tmp4)[16], col='red')

Having preferences for reciprocated ties and transitive ties seem to explain ‘most’ of the structure

Estimate SAOMs

How did I pick the numbers

  • \(\beta_d=-1.6468\)
  • \(\beta_r=-0.8932\)
  • \(\beta_t=0.2772\)

Manually, you could

  1. pick values \(\beta_d^{\ast}\), \(\beta_r^{\ast}\), and \(\beta_t^{\ast}\)
  2. simulate lots of networks, and
  3. adjust the parameters, so that
  • increase (decrease) \(\beta_d^{\ast}\) if density is too low (high)
  • increase (decrease) \(\beta_r^{\ast}\) if reciprocity is too low (high)
  • increase (decrease) \(\beta_t^{\ast}\) if transitivity is too low (high)
  1. repeat

Luckily, we have an algorithm (stochastic approximation) for automating this

Defining model

We define the model in the same way as when we simulated data

myeff <- getEffects(mydata)# We already have our effects, but let's start from scratch
myeff <- includeEffects(myeff, recip,include=TRUE) # add the effect reciprocated ties (it is alredy include by DEFAULT though)
myeff <- includeEffects(myeff, transTrip,include=TRUE)# we want to estimate the trasitivity effect

Set up the algorithm with default values (siena07 will assume you want to estimate)

est_model  <-  sienaAlgorithmCreate( 
  projname = 'est_model',
  n3 = 1000,# number of simulations in Phase 3 (default *is* 1000)
  simOnly = FALSE,# default *is* FALSE
  cond = FALSE, 
  useStdInits = FALSE)

Estimate!

est_ans <-siena07(est_model,# algorithm object
                  data = mydata, # the data object we created earlier
                  effects = myeff,# same effects as in simulation
                  returnDeps = TRUE,
                  batch=TRUE)

Estimation gives us an ANOVA table with - Parameter (point) estimates - Standard errors of estimates - Convergence statistis

summary( est_ans )
## Estimates, standard errors and convergence t-ratios
## 
##                                       Estimate   Standard   Convergence 
##                                                    Error      t-ratio   
##   1. rate basic rate parameter mynet1  7.0602  ( 0.9044   )   -0.0443   
##   2. eval outdegree (density)         -1.6435  ( 0.1352   )   -0.0153   
##   3. eval reciprocity                  0.9010  ( 0.2058   )    0.0222   
##   4. eval transitive triplets          0.2754  ( 0.0450   )    0.0162   
## 
## Overall maximum convergence ratio:    0.1014 
## 
## 
## Total of 1796 iteration steps.
## 
## Covariance matrix of estimates (correlations below diagonal)
## 
##        0.818       -0.015       -0.003        0.000
##       -0.119        0.018       -0.008       -0.004
##       -0.015       -0.300        0.042       -0.002
##        0.002       -0.679       -0.171        0.002
## 
## Derivative matrix of expected statistics X by parameters:
## 
##       11.962        5.487        4.120       41.958
##       71.989      230.721      157.008     1302.602
##       18.128       75.919       87.283      490.981
##      190.061      631.165      500.261     4820.576
## 
## Covariance matrix of X (correlations below diagonal):
## 
##      127.331      120.448       83.640      841.117
##        0.560      363.124      271.906     2338.000
##        0.462        0.889      257.867     1930.051
##        0.555        0.913        0.895    18046.277

These estimates look very much like the numbers that I picked arbitrarily - what makes these better

The observed statistics that we want to ‘hit’ on average are

est_ans$targets
## [1] 125 175  92 431
# the same as sim_ans$targets

The simulated statistics for the parameters from the estimation are

colMeans(est_ans$sf2)
##       [,1]    [,2]   [,3]    [,4]
## [1,] 124.5 174.708 92.356 433.172
# also provided in:
# est_ans$estMeans

The simulated statistics for the parameters from the simulation are

sim_ans$estMeans
## [1] 123.55094 174.01573  90.72833 419.69273

We can calculate within how many standard deviation units of the targets we are for each statistic \[ \frac{\bar{z}_k-z_{obs}}{sd(z_k)} \] The deviations using the estimates from estimation:

(colMeans(est_ans$sf2)-est_ans$targets)/sqrt(diag(var(est_ans$sf2[,1,])))
##             [,1]        [,2]      [,3]       [,4]
## [1,] -0.04431006 -0.01532341 0.0221693 0.01616836
# Also provided in:
# est_ans$tstat

For estimates from simulation:

(colMeans(sim_ans$sf2)-sim_ans$targets)/sqrt(diag(var(sim_ans$sf2[,1,])))

For both sets of parameters, the simulated statistics are on average within less that 0.1 standard deviations units of the observed statistics

As a quick illustration, we can see when we set rate, density, and reciprocity to the estimated values

myeff$initialValue[myeff$shortName=='Rate'] <- est_ans$theta[1]
myeff$initialValue[myeff$shortName=='density'][1] <- est_ans$theta[2]
myeff$initialValue[myeff$shortName=='recip'][1] <-est_ans$theta[3]

but pick another value for transitivity:

myeff$initialValue[myeff$shortName=='transTrip'][1] <- 0.1

and then, simulate!

sim_model  <-  sienaAlgorithmCreate( 
  projname = 'sim_model',
  cond = FALSE, 
  useStdInits = FALSE,
  nsub = 0 ,
  n3 = 1000,
  simOnly = TRUE)
sim_ans <- siena07( sim_model, data = mydata,
                    effects = myeff,
                    returnDeps = TRUE, batch=TRUE )

Calculate the scaled difference between the average statistics and the observed statistics again

(colMeans(sim_ans$sf2)-sim_ans$targets)/sqrt(diag(var(sim_ans$sf2[,1,])))

The everage simulated statistics:

sim_ans$estMeans

are not close to the observed

sim_ans$targets

Decreasing the transitivity parameter results in networks having too few transitive triads

Convergence check

In general, we say that the estimation has converged and estimates can be estimated if

  1. Convergence t-ratios \(| t_{conv} | < 0.1\), and
  2. Overall convergence ration less than \(0.25\)

Summary of estimation

The aim of the estimation algorithm is to find estimates \(\hat{\theta}\), such that \[ \underbrace{E_{\hat{\theta}}\{ Z[X(t_1)] \mid X(t_0)=x_{obs}(t_0)\}}_{\text{'average' statistics}}=Z[x_{obs}(t_1)] \] The stochastic approximation algorithm in siena07 solves this equation computationally in three Phases:

  1. Phase 1: Determining how big a change you get in \(E_{\theta}\{ Z(X) \mid x_{obs}(t_0)\}\), to calibrate updates to \(\theta\). This phase determines \(D\)
  2. Phase 2: The main estimation phase where you incrementally update the current values \(\theta^{(r)}\) to \(\theta^{(r+1)}\) \[ \theta^{(r+1)} = \theta^{(r)} - a_r D^{-1}(z_r-z_{obs}) \] where \(z_r=z(X_r)\), and \(X_r\) is simulated from \(x_{obs}(t_0)\) with parameter values $ ^{(r)}$. The final values (in practice average values over a subphase) are the estimates \(\hat{\theta}\)
  3. Phase 3: A large number of networks \(x^{(1)},\ldots,x^{(n_3)}\) are simulated using \(\hat{\theta}\) to calculate \(\bar{z}\) and \(sd(z)\), in order to calculate convergence t-ratios. In addition, the standard errors \(se(\hat{\theta})\) of the estimates \(\hat{\theta})\) are calculated.

You may notice that the difference that is minimised in this algoritm is not \(\bar{z}-z_{obs}\). Only one network is simulated in each interation - but it works (the other way also works but is less efficient)

Standard errors

The second column in the ANOVA table contains the nominal standard errors, i.e. the approximate standard deviations of the estimators of the \(\theta\)’s. Typically, these are used for standard hypothesis tests:

For effect/parameter \(k\), test \[ H_0:\beta_k=\beta_{k,0}=0 \] against \[ H_0:\beta_k\neq 0 \] The test statistic \[ \frac{\hat{\beta}_k-\beta_{k,0} }{sd(\hat{\beta}_k)}=\frac{\hat{\beta}_k }{sd(\hat{\beta}_k)}\approx \frac{\hat{\beta}_k }{se(\hat{\beta}_k)}, \] is approximately normally distributed \(N(0,1)\), if \(H_0\) is true.

Given the number of assumptions and approximations we need to make, I would advice that we stick to testing on the nominal 5% level, and reject \(H_0\) when the test statistic is greater than \(2\) in absolute value.

Test of simple model

In the simple model we estimated, let us test \(H_0:\beta_t=0\). The observed test statistic

est_ans$theta[4]/est_ans$se[4]
## [1] 6.113833

is considerably larger than 2, and we can reject the null-hypothesis that the true value of the transitivity parameter is \(0\).

Score-type test

Intuitively, we could also test if we ‘need’ \(\beta_t\), by estimating the model with \(\beta_t=0\), and check the convergence t-statistic. You can do this yourself now!

Testing attribute

There are many different types of covariates

Type RSiena type
Constant monadic covariates coCovar
Changing monadic covariates varCovar
Constant dyadic covariate coDyadCovar
Changing dyadic covariate varDyadCovar
Changing (covariate) network sienaDependent

The usual nomenclauture for measurement scales, and the distinction between metric and non-metric variables, applies.

Model with attributes

The adjacency matrices s501, s502, and s503, for the so-called s-50 dataset, the network of 50 girls, are also available with RSiena. For a full description of this dataset see ?s50 or http://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm

Rename adjacency matrices

For clarity, we rename the adjacency matrices

friend.data.w1 <- s501
friend.data.w2 <- s502
friend.data.w3 <- s503

Note that we have three (3) waves.

Among the many monadic covariates are ‘smoking’ and ‘drinking’:

drink <- s50a
smoke <- s50s

Here, each variable had its own \(n \times 3\) data matrix, one observation for each individual and each time-point

head(smoke)

We are going to test \[ H_0: \text{smokers have as many friends as non-smokers} \] Against an alternative hypothesis stating that there is a difference. We will interpret this as “sending as many ties” and “receiving as many ties” as non-smokers.

Pogues
Pogues

Formatting covariates

We will use smoking at \(t_0\) as our explanatory variable and define ‘smoking’ as a value of 2 or 3.

smoke1.matrix <- as.numeric( smoke[,1]>1 )
table(smoke1.matrix, useNA='always')

To tell RSiena how to use this variable, we format it

smoke1 <- coCovar( smoke1.matrix )

Print to screen to see how R has interpreted the variable

smoke1

Format DP

Format the dependent network variable as before:

friendshipData <- array( c( friend.data.w1, 
                            friend.data.w2, 
                            friend.data.w3 ),
                         dim = c( 50, 50, 3 ) )
friendship <- sienaDependent(friendshipData)

sienaDataCreate

Using sienaDataCreate, we give RSiena the oppotunity to figure out how data are structured and what types of effects can be calculated from it

s50data <- sienaDataCreate( friendship, smoke1)

getEffects

Since we have both a network as well as monadic covariates, we will have more effects avaialble to us that previously

s50.effects <- getEffects(s50data)

Sender effect

For testing our hypothesis we want a statistic that corresponds to \[ v_i x_{ij} \] for each tie-variable, and where \(v_i\) is one or zero according to whether \(i\) is a smoker or not, respectively. This is the sender effect

s50.effects <- includeEffects( s50.effects,# we "add and effect" to our effects object
                               egoX,# the shortname here evokes that variable for 'ego' affects decision
                               interaction1 = "smoke1" # the variable we want to interact the DV with
                               )

Receiver effect

Next, we want a statistic that corresponds to \[ v_j x_{ij} \] so that if the corresponding parameter is positive, then actors are more likely to form (maintain) ties to actors \(j\) that have \(v_j=1\), i.e. are smokers

s50.effects <- includeEffects( s50.effects,
                               altX,# the shortname here evokes that variable for 'alter' affects decision of ego
                               interaction1 = "smoke1" # the variable we want to interact the DV with
                               )

Transitivity

For the vanDeBunt dataset, we saw that triadic closure was important. We can chose to include it because we believe that it is important but also as a control for our hypothesised effects

s50.effects <- includeEffects(s50.effects,# We "add and effect" to s50.effects
                              transTrip,# shortname of the effect
                              include=TRUE)

Specified model

Our specified model is

s50.effects
##   effectName                          include fix   test  initialValue parm
## 1 constant friendship rate (period 1) TRUE    FALSE FALSE    4.69604   0   
## 2 constant friendship rate (period 2) TRUE    FALSE FALSE    4.32885   0   
## 3 outdegree (density)                 TRUE    FALSE FALSE   -1.46770   0   
## 4 reciprocity                         TRUE    FALSE FALSE    0.00000   0   
## 5 transitive triplets                 TRUE    FALSE FALSE    0.00000   0   
## 6 smoke1 alter                        TRUE    FALSE FALSE    0.00000   0   
## 7 smoke1 ego                          TRUE    FALSE FALSE    0.00000   0

sienaAlgorithmCreate

Specify the simulation settings

s50algorithm.simple <- sienaAlgorithmCreate( projname = 's50_simple' )# We are only using defaults

Estimate

Estimating the model with default settings is simply a callt to siena07

s50.simple.ans <- siena07( s50algorithm.simple,# estimation settings
                  data = s50data,# data object that knows DV and IV
                  effects = s50.effects,# the effects we specified
                  batch = TRUE,
                  returnDeps = TRUE )

Results

Print the results to screen

summary( s50.simple.ans )
## Estimates, standard errors and convergence t-ratios
## 
##                                    Estimate   Standard   Convergence 
##                                                 Error      t-ratio   
## 
## Rate parameters: 
##   0.1      Rate parameter period 1  6.4444  ( 1.0722   )             
##   0.2      Rate parameter period 2  5.2198  ( 0.8698   )             
## 
## Other parameters: 
##   1.  eval outdegree (density)     -2.6794  ( 0.1181   )   -0.0411   
##   2.  eval reciprocity              2.4578  ( 0.2102   )   -0.0353   
##   3.  eval transitive triplets      0.6177  ( 0.0755   )   -0.0310   
##   4.  eval smoke1 alter             0.0055  ( 0.1824   )    0.0398   
##   5.  eval smoke1 ego               0.0471  ( 0.1974   )   -0.0509   
## 
## Overall maximum convergence ratio:    0.1362 
## 
## 
## Total of 2203 iteration steps.
## 
## Covariance matrix of estimates (correlations below diagonal)
## 
##        0.014       -0.016       -0.004       -0.003        0.001
##       -0.635        0.044       -0.003        0.003        0.003
##       -0.410       -0.164        0.006        0.000       -0.002
##       -0.152        0.083       -0.019        0.033       -0.012
##        0.031        0.084       -0.118       -0.325        0.039
## 
## Derivative matrix of expected statistics X by parameters:
## 
##      316.194      251.881      545.392        7.338        4.700
##      141.860      148.192      286.710        0.569        0.031
##      366.271      331.308     1191.570       11.931       12.269
##       14.844       10.751       26.358       43.201       22.967
##        0.915       -1.618        2.579       24.688       38.514
## 
## Covariance matrix of X (correlations below diagonal):
## 
##      506.789      439.782     1203.778        2.808        3.903
##        0.936      435.173     1159.266        3.008        2.734
##        0.807        0.839     4386.436        1.996        0.741
##        0.016        0.019        0.004       58.793       42.313
##        0.024        0.018        0.002        0.757       53.139

Are all convergence statistics are less than 0.1 in absolute value? If so we can test our hypothesis - do we reject our \(H_0\)?

Adding homophily

To account for (test) possible assortativity on smoking, we add the homophily effect:

s50.effects <- includeEffects( s50.effects,# we "add and effect" to our effects object
                               egoXaltX,# the shortname
                               interaction1 = "smoke1" # the variable we want to interact the DV with
                               )

Re-estimate

s50.hom.ans <- siena07( s50algorithm.simple,# estimation settings
                  data = s50data,# data object that knows DV and IV
                  effects = s50.effects,# the effects we specified
                  batch = TRUE,
                  returnDeps = TRUE )

Results

Print the results to screen

summary( s50.hom.ans )
## Estimates, standard errors and convergence t-ratios
## 
##                                      Estimate   Standard   Convergence 
##                                                   Error      t-ratio   
## 
## Rate parameters: 
##   0.1      Rate parameter period 1    6.5620  ( 1.1765   )             
##   0.2      Rate parameter period 2    5.2829  ( 0.8868   )             
## 
## Other parameters: 
##   1.  eval outdegree (density)       -2.6963  ( 0.1103   )   -0.0655   
##   2.  eval reciprocity                2.4384  ( 0.1884   )   -0.0420   
##   3.  eval transitive triplets        0.6120  ( 0.0720   )   -0.0406   
##   4.  eval smoke1 alter              -0.0870  ( 0.2002   )   -0.0799   
##   5.  eval smoke1 ego                -0.0078  ( 0.1958   )   -0.0721   
##   6.  eval smoke1 ego x smoke1 alter  0.6426  ( 0.3611   )   -0.0773   
## 
## Overall maximum convergence ratio:    0.1190 
## 
## 
## Total of 2149 iteration steps.
## 
## Covariance matrix of estimates (correlations below diagonal)
## 
##        0.012       -0.012       -0.003        0.000        0.001       -0.006
##       -0.594        0.035       -0.003        0.003        0.002       -0.002
##       -0.356       -0.185        0.005       -0.001       -0.002       -0.001
##       -0.016        0.077       -0.044        0.040       -0.012       -0.018
##        0.027        0.049       -0.117       -0.310        0.038       -0.015
##       -0.142       -0.024       -0.028       -0.251       -0.206        0.130
## 
## Derivative matrix of expected statistics X by parameters:
## 
##      292.885      231.918      484.151        5.010        7.566       20.361
##      130.474      141.767      257.600        0.879        2.436        9.478
##      339.930      315.561     1150.329       14.733       20.300       29.838
##        8.007        6.328       28.606       43.381       27.393       10.599
##       10.764       10.420       45.175       30.361       44.497       11.262
##       23.603       20.690       49.639       10.349       10.196       13.801
## 
## Covariance matrix of X (correlations below diagonal):
## 
##      478.754      412.769     1130.343        6.258       12.087       41.163
##        0.934      408.305     1087.786        8.510       14.420       38.583
##        0.805        0.839     4121.622       31.279       51.322      108.502
##        0.035        0.052        0.060       65.309       53.015       17.954
##        0.069        0.089        0.100        0.820       64.074       18.485
##        0.420        0.426        0.377        0.496        0.516       20.050

Interpretation

If

  1. Convergence t-ratios \(| t_{conv} | < 0.1\), and
  2. Overall convergence ration less than \(0.25\)

we can test our hypothesis, controlling for possible assortativity/homophily

What about homophily on smoking - do smokers befried other smokers?